System and method for object tracking

ABSTRACT

A system for tracking an object is disclosed. The exemplary tracking system comprises an input device configured to detect two-dimensional input pixel data from a prop device and a multiprocessor unit configured to calculate three-dimensional position and orientation data associated with the prop device from the two-dimensional input pixel data. An exemplary method for tracking an object is also disclosed. Through this exemplary method, pixel data is received from an input device and edges of an object are defined. Three-dimensional position and orientation data of the object are calculated, wherein the edges of the object are associated with the three-dimensional position and orientation data of the prop device.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation and claims the prioritybenefit of U.S. patent application Ser. No. 10/928,778 entitled “Systemand Method for Object Tracking,” filed Aug. 26, 2004, which is acontinuation and claims the priority benefit of U.S. patent applicationSer. No. 09/621,578 entitled “Method for Mapping an Object from aTwo-Dimensional Camera Image to a Three-Dimensional Space forControlling Action in a Game Program,” filed Jul. 21, 2000, and now U.S.Pat. No. 6,795,068. The disclosure of this commonly owned application isincorporated herein by reference.

This application is related to U.S. patent application Ser. No.10/927,918 entitled “Method for Color Transition Detection,” filed Aug.26, 2004 and now U.S. patent number 7,______, which is a divisional andclaims the priority benefit of U.S. patent application Ser. No.09/621,578 entitled “Method for Mapping an Object from a Two-DimensionalCamera Image to a Three-Dimensional Space for Controlling Action in aGame Program,” filed Jul. 21, 2000, and now U.S. Pat. No. 6,795,068. Thedisclosure of this commonly owned application is incorporated herein byreference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to computer vision systems, and moreparticularly to a system in which an object is picked-up via anindividual video camera, the camera image is analyzed to isolate thepart of the image pertaining to the object, and the position andorientation of the object is mapped into a three-dimensional space. Athree-dimensional description of the object is stored in memory and usedfor controlling action in a game program, such as rendering of acorresponding virtual object in a scene of a video display.

2. Background of the Invention

Tracking of moving objects using digital video cameras and processingthe video images for producing various displays has been known in theart. One such application, for producing an animated video version of asporting event, has been disclosed by Segen, U.S. Pat. No. 6,072,504,the disclosure of which is incorporated in the present specification byreference. According to this system, the position of a tennis ballduring play is tracked using a plurality of video cameras, and a set ofequations relating the three-dimensional points in the court totwo-dimensional points (i.e. pixels) of digital images within the fieldof view of the cameras are employed. Pixel positions of the ballresolved in a given digital image can be related to a specificthree-dimensional position of the ball in play and, using triangulationfrom respective video images, a series of image frames are analyzed by aleast-squares method, to fit the positions of the ball to trajectoryequations describing unimpeded segments of motion of the ball.

As described in some detail by Segen, once a three-dimensionaldescription of position and motion of an object has been determined,various methods exist which are well known in the art for producing ananimated representation thereof using a program which animatesappropriate object movement in a video game environment.

Stated otherwise, Segen is concerned with determining thethree-dimensional position of an object in motion from a plurality oftwo-dimensional video images captured at a point in time. Once thethree-dimensional position of the “real” object is known, it is thenpossible to use this information to control a game program in any numberof different ways which are generally known to game programmers.

However, the system of Segen relies on a plurality of video cameras fordeveloping positional information about the object based ontriangulation. Moreover, the detected object of Segen is a simple spherewhich does not require information about the orientation (e.g.inclination) of the object in space. Thus, the system of Segen is notcapable of reconstructing position and orientation of an object, whethermoving or at rest, from a two-dimensional video image using a singlevideo camera.

It is common for game programs to have virtual objects formed from acombination of three-dimensional geometric shapes, wherein duringrunning of a game program, three-dimensional descriptions (positions andorientations) of the objects relative to each other are determined bycontrol input parameters entered using an input device such as ajoystick, game controller or other input device. The three-dimensionalposition and orientation of the virtual objects are then projected intoa two-dimensional display (with background, lighting and shading,texture, and so forth) to create a three-dimensional perspective sceneor rendition by means of the rendering processor functions of the gameconsole.

As an example, there can be “virtual object” that forms a moving imagein a game display corresponding to how one moves around the “real”object. To display the virtual object, the calculated three-dimensionalinformation is used for fixing the position and orientation of the“virtual object” in a memory space of the game console, and thenrendering of the image is performed by known projection processing toconvert the three-dimensional information into a realistic perspectivedisplay.

However, in spite of the above knowledge and techniques, problemscontinue to hinder successful object tracking, and a particularlydifficult problem is extracting precisely only those pixels of a videoimage which correspond unambiguously to an object of interest. Forexample, although movement of an object having one color against a solidbackground of another color, where the object and background colors varydistinctly from one another, can be accomplished with relative ease,tracking of objects, even if brightly colored, is not so easy in thecase of multi-colored or non-static backgrounds. Changes in lightingalso dramatically affect the apparent color of the object as seen by thevideo camera, and thus object tracking methods which rely on detecting aparticular colored object are highly susceptible to error or requireconstant re-calibration as lighting conditions change. The typical homeuse environment for video game programs demands much greater flexibilityand robustness than possible with conventional object tracking computervision systems.

SUMMARY OF THE INVENTION

In one exemplary embodiment of the present invention, an object trackingsystem is provided. The exemplary tracking system comprises and inputdevice configured to detect two-dimensional input pixel data from a propdevice. The system also comprises a multiprocessor unit configured tocalculate three-dimensional position and orientation data associatedwith the proper device from the two-dimensional input pixel data.

The present invention also discloses an exemplary method for tracking anobject. Through this exemplary method, pixel data is received from aninput device. Edges of an object are defined from the received pixeldata and three-dimensional position and orientation data of the objectare calculated, wherein the edges of the object are associated with thethree-dimensional position and orientation data of the prop device.

A machine readable medium having embodied thereon a program beingexecutable by a machine to perform a method for tracking an object isalso disclosed herein. That tracking method, in accordance with thepresent exemplary embodiment, generally corresponds to theaforementioned tracking method.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing an example of a configuration of amain part of a video game console which is adapted to receive an inputfrom a digital video camera.

FIG. 2 is an illustration showing movement of a hand held prop, as anauxiliary input device, in front of a digital camera for causingcorresponding action on a video display of a game program.

FIG. 3 is a block diagram showing the functional blocks required fortracking and discrimination of the prop as it is manipulated by theuser.

FIG. 4A illustrates a prop device according to one aspect of the presentinvention.

FIG. 4B illustrates a process for mapping two-dimensional pixel data ofa cylinder corresponding to the prop device shown in FIG. 4A to athree-dimensional space.

FIG. 5A illustrates a prop device according to another aspect of thepresent invention.

FIG. 5B illustrates a process for mapping two-dimensional pixel data ofa combined sphere and cylinder corresponding to the prop device shown inFIG. 5A to a three-dimensional space.

FIG. 6A illustrates a prop device according to still another aspect ofthe present invention.

FIG. 6B illustrates a process for mapping two dimensional pixel data ofstripes provided on a cylinder corresponding to the prop device shown inFIG. 6A to a three-dimensional space on the basis of color transitionsat the stripes.

FIG. 7 illustrates a prop device having a helical stripe thereon, andprovides a description of principles of another aspect of the presentinvention whereby a rotational component of the prop can be determined.

FIGS. 8A and 8B are graphs for describing a two-dimensional chrominancecolor space, for illustrating principles by which color transitionsassociated with colored stripes provided on a manipulated object areselected to maximize their detectability.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of a configuration of a main part of a videogame console 60 adapted for use with a manipulated object (prop) servingas an alternative input device.

The game console 60 constitutes a component of an overall entertainmentsystem 110 according to the present invention which, as shown in FIG. 1is equipped by a multiprocessor unit MPU 112 for control of the overallsystem 110, a main memory 114 which is used for various programoperations and for storage of data, a vector calculation unit 116 forperforming floating point vector calculations necessary for geometryprocessing, an image processor 120 for generating data based on controlsfrom the MPU 112, and for outputting video signals to a monitor 80 (forexample a CRT), a graphics interface (GIF) 112 for carrying outmediation and the like over a transmission bus between the MPU 112 orvector calculation unit 116 and the image processor 120, an input/outputport 124 for facilitating reception and transmission of data to and fromperipheral devices, an internal OSD functional ROM (OSDROM) 126constituted by, for example, a flash memory, for performing control of akernel or the like, and a real time clock 128 having calendar and timerfunctions.

The main memory 114, vector calculation unit 116, GIF 112, OSDROM 126,real time clock 128, and input/output port 124 are connected to the MPU112 over a data BUS 130.

Further connected to the BUS 130 is an image processing unit 138 whichis a processor for expanding compressed moving images and textureimages, thereby developing the image data. For example, the imageprocessing unit 138 can serve functions for decoding and development ofbit streams according to the MPEG2 standard format, macroblock decoding,performing inverse discrete cosine transformations, color spaceconversion, vector quantization and the like.

A sound system is constituted by a sound processing unit SPU 171 forgenerating musical or other sound effects on the basis of instructionsform the MPU 112, a sound buffer 173 into which waveform data may berecorded by the SPU 171, and a speaker 175 for outputting the musical orother sound effects generated by the SPU 171. It should be understoodthat the speaker 175 may be incorporated as part of the display device80 or may be provided as a separate audio line-out connection attachedto an external speaker 175.

A communications interface 140 is also provided, connected to the BUS130, which is an interface having functions of input/output of digitaldata, and for input of digital contents according to the presentinvention. For example, through the communications interface 140, userinput data may be transmitted to, and status data received from, aserver terminal on a network. An input device 132 (also known as acontroller) for input of data (e.g. key input data or coordinate data)with respect to the entertainment system 110, an optical disk device 136for reproduction of the contents of an optical disk 70, for example aCD-ROM or the like on which various programs and data (i.e. dataconcerning objects, texture data and the like), are connected to theinput/output port 124.

As a further extension or alternative to the input device, the presentinvention includes a digital video camera 190 which is connected to theinput/output port 124. The input/output port 124 may be embodied by oneor more input interfaces, including serial and USB interfaces, whereinthe digital video camera 190 may advantageously make use of the USBinput or any other conventional interface appropriate for use with thecamera 190.

The above-mentioned image processor 120 includes a rendering engine 170,a main interface 172, an image memory 174 and a display control device176 (e.g. a programmable CRT controller, or the like).

The rendering engine 170 executes operations for rendering ofpredetermined image data in the image memory, through the memoryinterface 172, and in correspondence with rendering commands which aresupplied from the MPU 112.

A first BUS 178 is connected between the memory interface 172 and therendering engine 170, and a second BUS is connected between the memoryinterface 172 and the image memory 174. First BUS 178 and second BUS180, respectively, have a bit width of, for example 128 bits, and therendering engine 170 is capable of executing high speed renderingprocessing with respect to the image memory.

The rendering engine 170 has the capability of rendering, in real time,image data of 320×240 pixels or 640×480 pixels, conforming to, forexample, NTSC or PAL standards, and more specifically, at a rate greaterthan ten to several tens of times per interval of from 1/60 to 1/30 of asecond.

The image memory 174 employs a unified memory structure in which, forexample, a texture rendering region and a display rendering region, canbe set in a uniform area.

The display controller 176 is structured so as to write the texture datawhich has been retrieved from the optical disk 70 through the opticaldisk device 136, or texture data which has been created on the mainmemory 114, to the texture rendering region of the image memory 174, viathe memory interface 172, and then to read out, via the memory interface172, image data which has been rendered in the display rendering regionof the image memory 174, outputting the same to the monitor 80 wherebyit is displayed on a screen thereof.

There shall now be described, with reference to FIG. 2, an overallsystem configuration by which a user holding a prop object manipulatesthe object in front of a digital video camera, for causing an action tooccur in a video game.

As shown in FIG. 2, the prop may comprise a stick-like object which ismade up of a handle 303 which is typically black in color, and abrightly colored cylinder (i.e. having a saturated color) 301. A userstands in front of the video camera 190, which may comprise a USB webcamor a digital camcorder connected to an input/output port 124 of a gameconsole 60 such as the “Playstation 2” manufactured by Sony ComputerEntertainment Inc. As the user moves the object in front of the camera190, the features of the object relating to the cylinder are picked upby the camera 190, and processing (tobe described later) is performed inorder to isolate and discriminate a pixel group corresponding only thecylinder. A three-dimensional description of the cylinder, including itsposition and orientation in three-dimensional space, is calculated, andthis description is correspondingly stored in the main memory 114 of thegame console 60. Then, using rendering techniques known in the art, thethree-dimensional description of the object is used to cause action in agame program which is displayed on the display screen of the monitor 80.For example, a virtual object, shown as a torch for example, can bemoved throughout the scene of the game, corresponding to the movementsof the real object made by the user. As the user changes the positionand orientation of the object by moving it, the three-dimensionaldescription of the object in the memory 114, and a correspondingrendering of the object in the rendering area of image memory 174, arecontinuously updated so that the position and orientation of the virtualobject, or torch, on the monitor 80 changes as well.

As noted above, the essential information which must be provided is athree-dimensional description of the object, which in the case of FIG. 2is a cylinder. However, the image which is picked up by the cameraprovides only two-dimensional pixel information about the object.Moreover, it is necessary to discriminate the pixels which relate onlythe object itself before a three-dimensional description thereof can becalculated.

FIG. 3 is a block diagram showing the functional blocks used to trackand discriminate a pixel group corresponding to the prop as it is beingmanipulated by the user. It shall be understood that the functionsdepicted by the blocks are implemented by software which is executed bythe MPU 112 in the game console 60. Moreover, not all of the functionsindicted by the blocks in FIG. 3 are used for each embodiment. Inparticular, color transition localization is used only in the embodimentdescribed in relation to FIGS. 6A and 6B, which shall be discussedbelow.

Initially the pixel data input from the camera is supplied to the gameconsole 60 through the input/output port interface 124, enabling thefollowing processes to be performed thereon. First, as each pixel of theimage is sampled, for example, on a raster basis, a color segmentationprocessing step S201 is performed, whereby the color of each pixel isdetermined and the image is divided into various two-dimensionalsegments of different colors. Next, for certain embodiments, a colortransition localization step S203 is performed, whereby regions wheresegments of different colors adjoin are more specifically determined,thereby defining the locations of the image in which distinct colortransitions occur. Then, a step for geometry processing S205 isperformed which, depending on the embodiment, comprises either an edgedetection process or performing calculations for area statistics, tothereby define in algebraic or geometric terms the lines, curves and/orpolygons corresponding to the edges of the object of interest. Forexample, in the case of the cylinder shown in FIG. 2 the pixel area willcomprise a generally rectangular shape corresponding to an orthogonalfrontal view of the cylinder. From the algebraic or geometricdescription of the rectangle, it is possible to define the center,width, length and two-dimensional orientation of the pixel groupcorresponding only to the object.

The three-dimensional position and orientation of the object arecalculated in step S207, according to algorithms which are to bedescribed in association with the subsequent descriptions of preferredembodiments of the present invention.

Lastly, the data of three-dimensional position and orientation alsoundergoes a processing step S209 for Kalman filtering to improveperformance. Such processing is performed to estimate where the objectis going to be at a point in time, and to reject spurious measurementsthat could not be possible, and therefore are considered to lie outsidethe true data set. Another reason for Kalman filtering is that thecamera 190 produces images at 30 Hz, whereas the typical display runs at60 Hz, so Kalman filtering fills the gaps in the data used forcontrolling action in the game program. Smoothing of discrete data viaKalman filtering is well known in the field of computer vision and hencewill not be elaborated on further.

In FIG. 4A, a prop which is used according to the first embodiment shallbe described, and in FIG. 4B a description is given which explains howthree-dimensional information of the position and orientation of theprop of FIG. 4A is derived from a two-dimensional video image thereof.

As shown in FIG. 4A, the prop is a cylindrical body 301 having a singlesolid color attached to a handle 303 which is preferably black in color.In order to fully define the position and orientation of the object in athree-dimensional space, a position of a given point p, typically thecenter of the object, in the X-Y plane and a depth quantity Z (i.e. theposition of the point p on the Z axis) must be determined, together withangular information of the object in at least two different planes, forexample, an inclination 0 of the object in the X-Y plane, and aninclination o of the object in the Y-Z plane. The actual physical lengthand diameter of the cylinder 301, together with knowledge of the focallength of the camera, may be used for scaling, but are not essential forprogramming action in a game program since the virtual object shown onthe display need not be of the same length and diameter, or even of thesame shape, as the prop.

Referring now to FIG. 4B, this figure shows a two-dimensional pixelimage 305 of the object produced by the video camera 190. A frontalorthogonal view of the cylindrical object 301 is picked up in the videoimage which appears as a generally rectangular pixel group 307, however,wherein the width of the pixel group can vary along the length l thereofas a result of the object being inclined in the phi ø direction or as aresult of the distance overall of the prop from the camera. It will beunderstood that the inclination in the phi ø direction is not directlyvisible in the video image 305.

To determine the length, center point, etc. of the pixel group 307 inaccordance with the geometry processing step S205 discussed above, knownarea statistics calculations are used. Area statistics include the area,centroid, moment about the X-axis, moment about the Y-axis, principalmoments, and the angle of principal moments, which typically are usedfor calculating moments of inertia of objects about a certain axis. Forexample, to determine the moments about the X and Y axes, respectively,if each pixel making up the pixel group is considered to correspond to aparticle of a given uniform mass m in making up a thin homogeneous sheetor lamina, then the moments about x and y axes of a system of n suchparticles (or pixels) located in a coordinate plane are defined asfollows: $\begin{matrix}{M_{x} = {\sum\limits_{k = 1}^{n}{m\quad y_{k}}}} & (1) \\{M_{y} = {\sum\limits_{k = 1}^{n}{m\quad x_{k}}}} & (2)\end{matrix}$

The center of mass of this system is located at the point (x, y) givenby $\begin{matrix}{{x = \frac{M_{y}}{m}},\quad{y = \frac{M_{x}}{m}}} & (3)\end{matrix}$

Further, assuming the lamina is of a shape having a geometric center,such as the rectangle in the case of FIG. 4B or a circle in the case ofFIG. 5B (to be discussed later), the center of mass of such a laminacorresponds to the geometric center. More generally, if one knows thearea statistics of the pixel region and, for example, that thetwo-dimensional shape is a rectangle, one can directly calculate itswidth, height and orientation. Similar calculations are possible withcircular shapes to determine the center point and radius, for example.Representative calculations for cases of rectangles and circles can befound in standard college-level calculus or physics texts.

Because the image 305 is already taken to be in the X-Y plane, the X-Yposition of the center point p can be derived directly from the image.Also, the theta θ quantity is taken directly from the image simply byknowing any line l, determined in accordance with the geometryprocessing step S205 described above, which runs along the longitudinalaxis of the pixel group 307 corresponding to the cylinder 301.Typically, a longitudinal line l passing through the center point p isused for this purpose.

Determination of the phi ø quantity requires some additional knowledgeabout the pixel width w in at least two different locations W1 and W2wherein the ratio of the width quantities w₁: w₂ provides a value whichcan be used for determining ø. More specifically, if the cylinder 301 isinclined so that the top end thereof is closer to the camera 190 thanthe lower end of the cylinder, then, since the lower end of the cylinderis at a greater distance from the camera 190, the pixel width quantityW₂ of the image will have a greater value than the pixel width quantityw₁, and vice versa. The ratio w₁:w₂ is proportional to the inclination øof the cylinder 301 in the Y-Z plane, and therefore the phi quantity øcan be determined from this ratio. Typically, for better accuracy, aplurality of equidistant measurements of pixel widths between ends ofthe pixel group 307 are taken, and averaging is performed to determinethe ratio w₁:w₂.

Determination of the depth quantity Z can be done in different ways.However, it is important to recognize that the size and number of pixelsmaking up the two-dimensional pixel group 307 are affected both by theinclination of the object in the ø direction as well as by the actualdistance of the physical object from the video camera 190. Morespecifically, as the object inclines in the ø direction, the apparentlength of the object as seen by the video camera tends to shorten, sothat the length l of the pixel group shortens as well. However, at thesame time, as the object moves farther away from the camera along theZ-axis, the apparent size of the object overall, including its length l,also becomes smaller. Therefore, it is insufficient simply to look atthe length l alone as an indicator of how far away from the camera theobject is. Stated otherwise, the depth quantity Z must be determined asa function of both l and ø.

However, if the phi quantity ø has already been determined and is known,a phi-weighted value of l, which we may call lø, can be determined, andthe pixel length of lø in the image, which changes as the object ismoved closer or farther from the camera while assuming that ø staysconstant, then can be used to determine the depth quantity Z since lø,will be proportional to Z.

Another method for determining depth Z is to count the total number ofpixels in the pixel group 307 corresponding to the object. As the objectgets closer to or farther away from the camera, the number of pixelsmaking up the pixel group 307 increases or decreases respectively, inproportion to the depth quantity Z. However, again, the number of pixelsin the pixel group 307 is also affected by the inclination in the phi ødirection, so the number of pixels N must first be weighted by phi ø toresult in a weighted quantity Nø which is used for determining the depthquantity Z based on a proportional relationship between Nø and Z.

Yet another advantageous method for determining the depth quantity Z isto use the average width w_(avg) of the rectangle, which is calculatedas the sum of a given number of width measurements of the rectangledivided by the number of width measurements taken. It should be clearthat the average width of the pixel group is affected only by Z and notby the phi-inclination of the cylinder. It is also possible to determinephi ø from the ratio of the total length of the pixel group to theaverage width (i.e. l: w_(avg)), and moreover, wherein the sign of thephi-inclination can be determined based on whether w₁ is greater or lessthan w₂.

In FIG. 5A, a prop which is used according to another embodiment shallbe described, and in FIG. 5B a description is given which explains howthree-dimensional information of the position and orientation of theprop of FIG. 5A is derived from a two-dimensional video image thereof.

The prop according to the second embodiment, similar to the firstembodiment shown in FIG. 4A, comprises a cylindrical stick-shapedobject, however in this case, a spherical object 309 of a differentcolor is rigidly affixed to one end of the cylinder 301. In addition,although not shown, a distal end of the cylinder may be provided whichprotrudes just slightly and is visible from an upper end of the sphere309. As shall be explained below, the sphere 309 provides a simplifiedmeans for determining the depth quantity Z and the inclination of theobject in the phi ø direction, which does not require measurement ofrelative widths of the cylinder 301, and which does not require anyweighting of the length quantity by phi ø in order to determine thedepth quantity Z.

As shown in FIG. 5B, a pixel group 311 corresponding to the sphere 309in the image appears as a two-dimensional circle. According to thisembodiment, a radius R and center point p_(s) of the circle aredetermined according to area statistics calculations which have alreadybeen discussed above. In this case, further, the total number of pixelsmaking up the pixel group 311 of the circle can be counted for giving apixel area of the circle. It will be appreciated that the circular pixelarea will increase as the spherical object 309 comes closer to thecamera 190 and vice versa, and therefore, since the total number ofpixels in the pixel group 311 making up the circle is proportional tothe depth quantity Z, the value for Z can be determined thereby.

It should also be realized that, unlike the cylinder in the previousembodiment, the shape and size of the circular pixel group are notinfluenced as a result of the phi ø angle of inclination. Morespecifically, even if the object overall is tilted in the phi direction,the sphere 309 and the pixel group 311 will retain their general shapeand, unlike the length of the cylinder 301, will not becomeforeshortened as a result of such tilting. Therefore, an advantage isobtained in that the total number of pixels of the pixel group making upthe circle in the image can always be related proportionately to thedepth quantity Z and, for determining Z, phi-weighting as in theprevious embodiment is not required.

Determination of inclination of the object in the theta θ direction isdone directly from the image, just as in the previous embodiment, bydetermining the angle theta θ between a center longitudinal line of thepixel group 307 corresponding to the cylinder 301 of the Y-axis.

Determining the angle of inclination in the phi ø direction is handledsomewhat differently than the previous embodiment. More specifically,such a quantity can be determined by knowledge of the depth quantity Z,determined as described above, and by the length l between the centerpoint of the circle 311 and the center point of the pixel group 307which corresponds to the cylinder 301. For any known and fixed depthquantity Z, the length l (as viewed from the perspective of the camera)becomes shorter as the object is tilted in the phi ø direction.Therefore, if the Z quantity is known, it is possible to determine,simply from the length l, the degree of inclination in the phi ødirection, and it is not necessary to calculate a relative widthquantity of ratio of widths, as in the embodiment shown by FIGS. 4A and4B.

FIG. 6A illustrates a prop device according to still another aspect ofthe present invention.

As in the embodiment shown in FIG. 6A, the prop itself comprises agenerally cylindrical body 301. In addition, three stripes S₁, S₂ and S₃having a different color than the cylinder itself are provided on thecylindrical body. Preferably, the stripes S₁, S₂ and S₃ are each equalin width and are spaced equidistant from each other, at either end ofthe cylinder 301 and in the center thereof.

According to this embodiment, a pixel group making up the cylinder isextracted from the image to provide a two-dimensional line along whichto look for color transitions. To determine the quantities Z, θ and ø,positions are determined at which color transitions along any line l inthe longitudinal direction of the cylinder 301 occur.

More specifically, as shown in FIG. 6B, a group made up of only thosepixels corresponding to a line l along the longitudinal direction of thecylinder body 301, as viewed by the camera, needs to be sampled in orderto determine where along the line l distinct color transitions occur. Inparticular, for detecting such color transitions, the chrominance valuesCr and Cb which are output as part of the YCrCb signals from the videocamera 190 are detected. For reasons which shall be explained below inconnection with the criteria for selecting the stripe colors, it ispreferable to use a combined chrominance value D made up of aPythagorean distance of the combined chrominance signals Cr and Cb foreach color of the cylinder 301 and stripes S₁, S₂ and S₃, respectively,thereby defining a separation in the two-dimensional chrominance colorspace used by the video camera 190, according to the following formula(1):D=√{square root over ((ΔCr)²+(ΔCb)² )}  (4)

By selecting colors which maximize the value of D (to be explained inmore detail later), it is possible to select a threshold D_(t) at whichonly color transitions above a certain separation where D>D_(t) areconsidered to correspond to the color transitions of the stripes S₁, S₂and S₃. Accordingly, the pixels along the line of the cylinder arefiltered, using such a threshold, in order to find the large colortransitions corresponding to the stripes S₁, S₂ and S₃.

As shown in FIG. 6B, at positions along the line l where colortransitions occur, for each stripe two spikes corresponding to positionswhere color transitions appear can be detected, and the center pointbetween these spikes is taken to be the position of the stripes. Oncethe positions of the stripes are fixed, it is then a matter of course todetermine the lengths l₁ and l₂ between the stripes, wherein the overalllength of the cylinder is determined by the sum of l₁ and l₂.

It shall next be explained how knowledge of l₁ and l₂ providessufficient information for determining the quantities of Z, θ and Φ,necessary for describing the position and orientation of the object inthree dimensions.

First, since the line l defined by the pixels running along the lengthof the cylinder has already been determined, and since the camera isassumed to face normally to the X-Y plane, the angle θ is taken directlyas the angle between the longitudinal line of the cylinder and the Yaxis, basically in the same manner as the preceding embodiments.

For determining the angle of inclination in the phi ø direction, theratio of the lengths l₁:l₂ is used. For example, in the case (as shown)in which the cylinder is inclined in the ø direction toward the camera190, with the upper end of the cylinder being closer to the camera thanthe lower end, the length l₁ will appear longer to the camera 190 (sinceit is closer) than the length 12. It will also be appreciated that,although the apparent lengths l₁ and l₂ will also be affected by theoverall distance Z of the object from the camera 190, the ratio of theselengths l₁:l₂ will not change and therefore this ratio provides aconstant indication of the inclination of the cylinder 301 in the phi ødirection.

For determining the depth quantity Z, a procedure similar to that of thefirst embodiment is employed, wherein a phi-weighted quantity 1_(Φ) ofthe total length 1(1=l₁+l₂) is determined for giving Z. Morespecifically, the influence of the inclination angle ø on the totalapparent length l of the object is first determined, and then the totallength, properly weighted by the influence of ø, is proportional to thedistance (or depth quantity) Z of the object from the camera 190.

Stated more simply, ø is determined from the ratio of l₁ and l₂, andonce phi ø is known, the total depth quantity Z can be determined fromthe sum of l₁ and l₂.

There shall now be described, in connection with FIG. 7, a method fordetermining a rotational component of the prop. This method may beapplied in conjunction with any of the embodiments which have discussedabove, by further equipping the prop device with a helical stripe S_(H)thereon.

Each of the tracking methods described above can be used to obtain fiveof the six degrees of freedom of the objects. The only one missing isthe rotation of the cylinder about its axis. Information about therotation of the cylinder would seem difficult to obtain becausecylinders are symmetric in rotation about this axis. The approach takenby the present invention to obtain this rotational component is to add ahelical stripe S_(H) that goes around the cylinder 301 exactly once. Asthe cylinder 301 is rotated, the height of the stripe S_(H) willcorrespond to the rotation angle.

More specifically, as shown in FIG. 7, the cylinder 301 (or thecylinder-part of the prop in the case of FIGS. 5A and 5B) includes thesingle helical strip S_(H) thereon which goes around the object onlyonce. Information pertaining to the helical stripe is extracted, eitherfrom the entire pixel group 313 which makes up the helical stripe or byusing the color transitions corresponding to the helical stripe S_(H),in order to determine, using the geometry processing discussed above, ahelix H which best fits to the stripe S_(H).

In addition to the helix H, a center line l of the pixel groupcorresponding to the cylinder is determined as described previously.Also the overall length l of the pixel group is determined.

For obtaining a degree of rotation of the cylinder, various heights h(only h₁ and h₂ are shown for simplicity) each of which define thedistance between one end of the cylinder and the point p where thecenter line intersects the helix are determined.

As shown on the right-hand side of FIG. 7, the camera 190 only sees oneside (or orthogonal projection) of the cylinder 301 at a time.Accordingly, the helix H determined from the extracted region of thecamera image determines the degree of revolution of the cylinder 301.More specifically, as shown, assuming no rotation (i.e. a rotationalcomponent of 0 degrees), a center line extending from one end to a pointon the helix will have a first height h₁, whereas if the object isrotated by 45 degrees, the height of the center line l between the lowerend to the point where it intersects the helix H will have a shorterheight h₂. The condition shown by the far right-hand side of FIG. 7, ata rotation of 90 degrees, represents to a unique case in which thecenter line will intersect the helix at two points. Hence, bycalculating the heights of the center line l, a component of rotation ofthe cylinder 301 (or any other object affixed to the cylinder androtated thereby) can be determined.

The specific quantity used for determining rotation is the ratio of thedetected height between the lower end and the point on the helix to thetotal length I of the pixel group. This ratio gives a number from 0 to k(where k=h_(max)/1), which maps directly to a range of from 0 to 360degrees. Thus, additional information with respect to the object andorientation of the cylinder 301 in three-dimensional space can beprovided. Such information can be used to control the rotation of avirtual object, for example, when displayed in a game program.

Next, with respect to FIGS. 8A and 8B, a process for selection of colorsfor the stripes, according to the embodiments of FIGS. 6A and 6B shallnow be described. More specifically, FIG. 8A shows a diagram of a colorspace defined by luminance and radial coordinates of hue and saturation.Luminance is the brightness or intensity of the color, hue is the shiftin the dominant wavelength of a spectral distribution, and saturation isthe concentration of a spectral distribution at one wavelength.

By contrast, FIG. 8B shows a two-dimensional chrominance color spacecorresponding to the Cr and Cb chrominance output signals of the videocamera. It is well understood in the art that video cameras outputsignals for controlling the color of each pixel making up a video image.As shown by the color wheel diagram of FIG. 8A, color can be definedusing radial coordinates corresponding respectively to hue andsaturation. However, as it is needlessly complex for computerized imageprocessing to use radial coordinates, another more useful standard fordefining color is the YCrCb color definition, which is the most commonrepresentation of color used in the video world. YCrCb represents eachcolor by a single luma component (Y) and two components of chrominanceCr and Cb. Y may be loosely related to brightness of luminance whereasCr and Cb make up a quantities loosely related to hue. These componentsare defined more rigorously in ITU-R BT.601-4 (Studio encodingparameters of digital television for standard 4:3 and wide-screen 16:9aspect ratios) published by the International Telecommunication Union.Thus, the Cr and Cb chrominance signals for each pixel are defined byCartesian coordinates which also can be used to determine a locationwithin the color wheel corresponding to a certain hue and saturation.

According to the present invention, the color of the stripes S₁, S₂ andS₃ and the color of the cylinder 301 are chosen in such a way as tomaximize stripe detectability for the video camera. Color-based trackingis notorious for its difficulties due to changes in lighting, whichcause the apparent color to vary. As a result, if one is attempting todetect a certain color of blue corresponding to an object, for example,under certain lighting conditions it is possible for the color of blue,as perceived by the camera, to vary to such a degree that accuratedetection of the object is made difficult. In the present invention, bylooking for color transitions instead of absolute colors, a more robusttracking solution can be attained. For example, in the embodiment ofFIGS. 6A and 6B, if the cylinder 301 is blue and the stripes S₁, S₂ andS₃ are orange, if lighting conditions change, then the apparent colorswill also change. However, the transition between these colors, as shownin FIG. 6B, will still be very evident.

As discussed above, video cameras capture data using the two-dimensionalchrominance color space shown in FIG. 8B. By choosing colors for theobject and stripes, respectively, which have a maximal separation D inthis space, it is possible to significantly enhance the detectability ofthe color transitions.

More specifically, as shown in FIG. 8B, highly saturated colors of blueand orange are located at substantially diametrically opposed sides ofthe color wheel and are separated by a large distance D in the colorspace. The actual distance D can be calculated as the hypotenuse of atriangle having sides defined by ΔCr (i.e. the difference in the Crchrominance signal values for the two colors of blue and orange) and ΔCb(i.e. the difference in the Cb chrominance signal values for the sametwo colors), and hence the actual distance D is the square root of(ΔCr)²+(ΔCb)², as already discussed above in equation (4).

Although blue and orange have been described as an example, it will beappreciated that any other color pairs, for example green and magenta,which also possess a large separation in the chrominance color space maybe used. In other words, the method provides a general criteria wherebycolors may be selected using their chrominance signals Cr and Cb in sucha manner to maximize their separation in the color space.

More specifically, a generally applicable method for the selection ofcolors, as well as for calculating distance between any two colors, isperformed in such a way that the distance between two colors iscalculated as a distance projected onto a certain diameter-spoke of thecolor wheel. First, a given diameter-spoke on the color wheel isselected having a certain angle of orientation θ. By choosing the angleof orientation of the selected diameter on the color wheel, it ispossible to select the color transitions one wants to detect. Forexample, if green is (1, 1) and magenta is (−1, −1), the diameter of thespoke should be set at an orientation θ of 45 degrees. Then the colorseparation distance is calculated simply by projecting the colors ontothe 45 degree line. In this manner, for the case of green and magenta,the computed distance is exactly the same as the Pythagorean distance Ddiscussed above, however with a diameter-line orientation of 45 degrees,the distance between blue and orange is zero, because they both projectto the origin. This tells us that, for a selected diameter line of 45degrees, green and magenta are the optimal colors for detection, sincethey possess the maximum separation in the color space for thisdiameter.

Thus, for any given diameter angle of θ, which can be chosen from 0 to180 degrees, the separation between two colors (Cr₁, Cb₁) and (Cr₂, Cb₂)may be calculated according to equation (5) as follows:D=[Cr ₁·cos θ+Cb ₁·sin θ]−[Cr ₂·cos θ+Cb₂·sin θ]  (5)

The distance calculation shown by equation (5) can therefore also beused for setting the threshold D_(t) based on a predeterminedorientation defined by the angle θ. For example, if the colortransitions for the object were in fact green and magenta, the generaldistance calculation above can be used for threshold setting, whilefixing the angle θ of this equation at 45 degrees.

Herein have been described several methods for determining the positionand orientation of a real object manipulated in front of a video camera,by mapping the two-dimensional image information of the object capturedby the camera to a three-dimensional space, wherein a three dimensionaldescription including position and orientation of the object may be usedto control action in a game program.

Although one clear example of controlling a game program is to have a“virtual object” that forms a moving image in a game displaycorresponding to how the “real” object is moved or positioned, it willbe appreciated that the three-dimensional information can be used tocontrol game programs in any number of different ways foreseeable topersons skilled in the art. For example, a “theremin” like musicaleffect can be achieved wherein changes in the position and orientationof the manipulated object could be used to influence volume, tone,pitch, rhythm and so forth of sounds produced by the sound processor.Such a musical or rhythmic sound effect can be provided in combinationwith visual effects displayed on the screen of the game console, forenhancing the experience perceived by the game player.

It shall be understood that other modifications will be apparent and canbe easily made by persons skilled in the art without departing from thescope and spirit of the present invention. Accordingly, the followingclaims shall not be limited by the descriptions or illustrations setforth herein, but shall be construed to cover with reasonable breadthall features which may be envisioned as equivalents by those skilled inthe art.

1. An object tracking system comprising: an input device configured todetect two-dimensional input pixel data from a prop device; and amultiprocessor unit configured to calculate three-dimensional positionand orientation data associated with the prop device from thetwo-dimensional input pixel data.
 2. The object tracking system of claim1, wherein the multiprocessor unit comprises a memory configured tostore the three-dimensional position and orientation data associatedwith the prop device.
 3. The object tracking system of claim 1, whereinthe multiprocessor unit comprises an image processor configured toexecute operations for rendering the three-dimensional position andorientation data associated with the prop device.
 4. The object trackingsystem of claim 1, further comprising a monitor for displaying action ina game program caused by rendering the three-dimensional position andorientation data associated with the prop device.
 5. The object trackingsystem of claim 1, wherein the prop device comprises a saturated color.6. The object tracking system of claim 1, wherein the input device is acamera.
 7. The object tracking system of claim 2, wherein the memoryfurther comprises a display rendering region.
 8. The object trackingsystem of claim 2, wherein the memory further comprises a texturerendering region.
 9. The object tracking system of claim 1, wherein themultiprocessor unit is further configured to: determine the color ofeach pixel in the two-dimensional input pixel data; and define edges ofan object by dividing an image comprising the input pixel-data intotwo-dimensional segments of color, wherein the defined edges areassociated with the three-dimensional position and orientation data ofthe prop device.
 10. The object tracking system of claim 9, wherein themultiprocessor unit is further configured to localize color transitionswhereby distinct color transitions are defined prior to defining theedges of the object in the image.
 11. The object tracking system ofclaim 1 further comprising a filter, wherein the filter is configured tofilter the three-dimensional position and orientation data.
 12. Theobject tracking system of claim 11, wherein the filter comprises aKalman filter.
 13. The object tracking system of claim 9, wherein themultiprocessor unit is further configured to employ an edge detectionprocess to define the edges of the object.
 14. The object trackingsystem of claim 9, wherein the multiprocessor unit is further configuredto employ area statistics calculations to define the edges of theobject.
 15. The object tracking system of claim 9, wherein thedefinition of an edge of the object is algebraic.
 16. The objecttracking system of claim 9, wherein the definition of an edge of theobject is geometric.
 17. A method for tracking an object, comprising:receiving pixel data from an input device; defining edges of an objectfrom the received pixel data; and calculating three-dimensional positionand orientation data of the object, wherein the defined edges areassociated with the three-dimensional position and orientation data ofthe object.
 18. The method of claim 17, further comprising localizingcolor transitions whereby distinct color transitions are defined priorto defining the edges of the object.
 19. The method of claim 17, furthercomprising the application of a Kalman filter to the three-dimensionalposition and orientation data.
 20. The method of claim 17, whereindefining the edges of the object comprises employing an edge detectionprocess.
 21. The method of claim 17, wherein defining the edges of theobject comprises employing area statistics calculations.
 22. A machinereadable medium having embodied thereon a program being executable by amachine to perform a method for tracking an object, the methodcomprising: receiving pixel data from an input device; defining edges ofan object from the received pixel data; and calculatingthree-dimensional position and orientation data of the object, whereinthe defined edges are associated with the three-dimensional position andorientation data of the object.